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Abstract. This paper deals with several issues related to the pointwise consistency 
of the kriging predictor when the mean and the covariance functions are known. 
These questions are of general importance in the context of computer experiments. 
The analysis is based on the properties of approximations in reproducing kernel 
Hilbert spaces. We fix an erroneous claim of Yakowitz and Szidarovszky (J. Mul- 
tivariate Analysis, 1985) that the kriging predictor is pointwise consistent for all 
continuous sample paths under some assumptions. 
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1 Introduction 

The domain of computer experiments is concerned with making inferences about 
the output of an expensive-to-run numerical simulation of some physical system, 
which depends on a vector of factors with values in X C Mr. The output of the sim- 
ulator is formally an unknown function / : X — > K. For example, to comply with 
ever-increasing standards regarding pollutant emissions, numerical simulations are 
used to determine the level of emissions of a combustion engine as a function of its 
design parameters (Villemonteix, 2008). The emission of pollutants by an engine in- 
volves coupled physical phenomena whose numerical simulation by a finite-element 
method, for a fixed set of design parameters of the engine, can take several hours on 
high-end servers. It then becomes very helpful to collect the answers already pro- 
vided by the expensive simulator, and to construct from them a simpler computer 
model, that will provide approximate but cheaper answers about a quantity of inter- 
est. This approximate model is often called a surrogate, or a metamodel, or an emu- 
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lator of the actual simulator /. The quality of the answers given by the approximate 
model depends on the quality of the approximation, which depends, in turn and in 
part, on the choice of the evaluation points of /, also called experiments. The choice 
of the evaluation points is usually called the design of experiments. Assuming that / 
is continuous, it is an important question to know whether the approximate model 
behaves consistently, in the sense that if the evaluation points x n are chosen sequen- 
tially in such a way that a given point x G X is an accumulation point of {x n , n > 1}, 
then the approximation at x converges to f(x). 

Since the seminal paper of Sacks et al. (1989), kriging has been one of the most 
popular methods for building approximations in the context of computer experi- 
ments (see, e.g., Santner et al., 2003). In the framework of kriging, the unknown 
function / is seen as a sample path of a stochastic process £, , which turns the prob- 
lem of approximation of / into a prediction problem for the process % . In this paper, 
we shall assume that the mean and the covariance functions are known. Motivated 
by the analysis of the expected improvement algorithm (Vazquez and Beet, 2009), 
a popular kriging-based optimization algorithm, we discuss several issues related 
to the pointwise consistency of the kriging predictor, that is, the convergence of the 
kriging predictor to the true value of B, at a fixed point x G X. These issues are barely 
documented in the literature, and we believe them to be of general importance for 
the asymptotic analysis of sequential design procedures based on kriging. 

The paper is organized as follows. Section 2 introduces notation and various 
formulations of pointwise consistency, using the reproducing kernel Hilbert space 
(RKHS) attached to ^ . Section 3 investigates whether L 2 -pointwise consistency at x 
can hold when x is not in the adherence of the set {x n , n > 1 }. Conversely, assuming 
that x is in the adherence, Section 4 studies the set of sample paths / = £(©,•) 
for which pointwise consistency holds. In particular, we fix an erroneous claim of 
Yakowitz and Szidarovszky (1985) — namely, that the kriging predictor is pointwise 
consistent for all continuous sample paths under some assumptions. 

2 Several formulations of pointwise consistency 

Let 4 be a second-order process defined on a probability space (£2,£#, P), with 
parameters G X C W 1 . Without loss of generality, it will be assumed that the mean 
of £ is zero and that X = W 1 . The covariance function of t, will be denoted by 
k(x,y) := E (jt)£ (y)}, and the following assumption will be used throughout the 
paper: 

Assumption 1. The covariance function k is continuous. 

The kriging predictor of £, (x), based on the observations % (xf), i = 1, . . . is the 
orthogonal projection 

f(x;xj := tk'texJSixt) (1) 
i=i 

of £(x) onto span{§(jC;), i = 1,. ..,«}. The variance of the prediction error, also 
called the kriging variance in the literature of geostatistics (see, e.g., Chiles and 
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Delfiner, 1999), or the power function in the literature of radial basis functions (see, 
e.g., Wu and Schaback, 1993), is 

C 2 (x;x n ) := var 

= k(x,x) — jT X'{x;x n )k{x,xi) . 

i 

For any x 6 Mr, and any sample path / = B, (to, •), (0 E £2, the values t, {(0,x) = 
/(x) and £,(co,x;x n ) can be seen as the result of the application of an evaluation 
functional to /. More precisely, let 8 X be the Dirac measure at x E W l , and let X n ^ 
denote the measure with finite support defined by X nx := Y!i=iX l (x;x n )8 Xj . Then, 

for all (0 E £2, % ((0,x) = ( 8 X , f) and £, (<3),x;x„) = (X n>x , /). Pointwise consistency 
at x E R d , defined in Section 1 as the convergence of ^(aJjXjx,,) to ^(x), can thus 
be seen as the convergence of X lux to 8 X in some sense. 

Let Jtf be the RKHS of functions generated by k, and JZ 7 * its dual space. De- 
note by (•,•),# (resp. (■,•),#*) the inner product of (resp. Jtf*), and by 
(resp. ||-||j)r*) the corresponding norm. It is well-known (see, e.g., Wu and Sch- 
aback, 1993) that 

\\Sx- hn,x\\ 2 jt?* = \\K x r)-Y,i^ l ( X '^n) k ( X ^')\\ 2 Jlf = a2 (^in)- 

Therefore, the convergence X nx — » 5* holds strongly in Jf?* if and only if the krig- 
ing predictor is L 2 (£2,&/ 7 P)-consistent at x; that is, if a 2 (x;x J! ) converges to zero. 
Since k is continuous, it is easily seen that c 2 (x;x n ) — > as soon as x is adherent 
to {x„,« > 1}. Indeed, 

<? 2 (x,x») < E[(Z(x)-%(x % )) 2 } =k(x,x)+k(x Vn ,x< Pn )-2k(x,x Vn ), 

with (<p„)„ G N a non-decreasing sequence such that Vn > 1, <p„ < « and x<p n — > x. 
As explained by Vazquez and Beet (2009), it is sometimes important to work with 
covariance functions such that the converse holds. That leads to our first open issue, 
which will be discussed in Section 3: 

Problem 1. Find necessary and sufficient conditions on a continuous covariance k 
such that d 2 (x;x„) — » implies that x is adherent to {x„,« > 1}. 

Moreover, since strong convergence in Jff* implies weak convergence in Jtf?*, 
we have 

lim(7 2 (x;x„)=0 V/SJT, lim (X„. X J) = (8 X , f) = f(x) . (2) 

Therefore, if x is adherent to {x„, n > 1 }, pointwise consistency holds for all sample 
paths / E 3*tf. However, this result is not satisfying from a Bayesian point of view 
since P{^ E = if % is Gaussian (see, e.g., Lukic and Beder, 2001, Driscoll's 
theorem). In other words, modeling / as a Gaussian process means that / cannot be 
expected to belong to ,if . This leads to our second problem: 



£(x)-£(x;xj 



4 



Emmanuel Vazquez and Julien Beet 



Problem 2. For a given covariance function k, describe the set of functions Sf such 
that, for all sequences (*«)„> 1 in W 1 and all x £ M. d , 

limff 2 (i;i„)=0 =^> V/GSf, Km (A v ,/) =/(*). (3) 

An important question related to this problem, to be discussed in Section 4, is to 
know whether the set <$ contains the set C(R d ) of all continuous functions. Before 
proceeding, we can already establish a result which ensures that considering the 
kriging predictor is relevant from a Bayesian point of view. 

Theorem 1. If t, is Gaussian, then {t, ^ Sf} is F '-negligible. 

Proof. If 4 is Gaussian, it is well-known that ^{x;x n ) = E[£j(x) | J^„] a.s., where 
& n denotes the <7-algebra generated by % ix\), . . . , £, (x„). Note that (E[| (x) \ &„]) 
is an L 2 -bounded martingale sequence and therefore converges, a.s. and in L 2 -norm, 
to a random variable (see, e.g., Williams, 1991). □ 

3 Pointwise consistency in L 2 -norm and the No-Empty-Ball 
property 

The following definition has been introduced by Vazquez and Beet (2009): 

Definition 1. A random process £, has the No-Empty-Ball (NEB) property if, for all 
sequences i in Mr and all x £ M d , the following assertions are equivalent: 

i) x is an adherent point of the set {x n , n> 1 }, 

ii) (J 2 (x,x n ) — > when n — * +°°. 

The NEB property implies that there can be no empty ball centered at x if the pre- 
diction error at x converges to zero — hence the name. Since k is continuous, the 
implication l.i => l.ii is true. Therefore, Problem 1 amounts to finding necessary 
and sufficient conditions on k for ^ to have the NEB property. 

Our contribution to the solution of Problem 1 will be twofold. First, we shall 
prove that the following assumption, introduced by Yakowitz and Szidarovszky 
(1985), is a sufficient condition for the NEB property: 

Assumption 2. The process % is second-order stationary and has spectral density S, 
with the property that S~ l has at most polynomial growth. 

In other words, Assumption 2 means that there exist C > and r £ W such that 
S(u)(l + \u\ r ) > C, almost everywhere on W 1 . Note that this is an assumption 
on k, which prevents it from being too regular. In particular, the so-called Gaussian 
covariance, 

k{x,y) = s 2 e- a ^ x - y W\ s > 0, a > 0, (4) 

does not satisfy Assumption 2. In fact, and this is the second part of our contribution, 
we shall show that | with covariance function (4) does not possess the NEB prop- 
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erty. Assumption 2 still allows consideration of a large class of covariance functions, 
which includes the class of (non-Gaussian) exponential covariances 

k(x,y) =s 2 e- a ^- y ^, s>0, a > 0, < j3 <2, (5) 

and the class of Matern covariances (popularized by Stein, 1999). 
To summarize, the main result of this section is: 

Proposition 1. 

i) If Assumption 2 holds, then t, has the NEB property. 

ii) IfE, has the Gaussian covariance given by (4), then t, does not possess the NEB 
property. 

The proof of Proposition 1 is given in Section 5. To the best of our knowledge, 
finding necessary and sufficient conditions for the NEB property — in other words, 
solving Problem 1 — is still an open problem. 

4 Pointwise consistency for continuous sample paths 

An important question related to Problem 2 is to know whether the set Sf contains the 
set C(R rf ) of all continuous functions. Yakowitz and Szidarovszky (1985, Lemma 
2.1) claim, but fail to establish, the following: 

Claim 1. Let Assumption 2 hold. Assume that {x n , n > 1} is bounded, and denote 
by Xo its (compact) closure in M.' 1 . Then, if x G Xo, 

V/eC(M''), Jim (A„,-J )=/(*). 

Their incorrect proof has two parts, the first of which is correct; it says in essence 
that, if* G Xo (i.e., if xis adherent to {x ni n > 1}), then 

V/ G y(R d ), lim (X n , x J) = f(x) , (6) 

where y(W l ) is the vector space of rapidly decreasing functions 1 . In fact, this re- 
sult stems from the weak convergence result (2), once it has been remarked that 2 
y(R d ) C J? under Assumption 2. 



1 Recall that y(R d ) corresponds to those / € C°(R d ) for which 

sup sup (i + |*| 2 fl(r>7)MI<°° 

|v|<iV xER d 

for N = 0, 1, 2, . . ., where D v denotes differentiation of order v. 

2 Indeed, under Assumption 2, we have V/ £ ^(M ), 

= I l /(M) | 25(!,rlrf " * ck? I |/>)|2 (1 + '" r) du < + ~' 

where / is the Fourier transform of / (see, e.g., Wu and Schaback, 1993). 



6 



Emmanuel Vazquez and Julien Beet 



The second part of the proof of Claim 1 is flawed because the extension of the 
convergence result from ^(R ) to C(M. d ), on the ground that S^iW 1 ) is dense 
in C(M c/ ) for the topology of the uniform convergence on compact sets, does not 
work as claimed by the authors. To get an insight into this, let / £ C(R rf ), and let 
(<j>k) S ^(W 1 )^ be a sequence that converges to / uniformly on Xo. Then we can 
write 

\{KxJ)- /(*) I < I <A„ v v,/ -fa)\ + \(K*-Sx,fa)\ + 1 & -/Ml 

< (l + ||A„.. V || TV ) sup |/ — 0* | + \(K.x-8 x ,(j) k }\ , 

x 

where ||A b ^||tv : — L"=i I A' (*;*„) | i s me total variation norm of X„. x , also called the 
Lebesgue constant (at x) in the literature of approximation theory. If we assume that 
the Lebesgue constant is bounded by K > 0, then we get, using (6), 

limsup|(A„,. v ,/)-/(x)| < (l+tf)sup|/-fe| ► 0. 

«^°° Xo 

Conversely, if the Lebesgue constant is not bounded, the Banach-Steinhaus theorem 
asserts that there exists a dense subset G of (C(R d ), || •!!«,) such that, for all / G G, 
sn Pn>i\(^n,x,f)\ = +°° (see, e.g., Rudin, 1987, Section 5.8). 

Unfortunately, little is known about Lebesgue constants in the literature of krig- 
ing and kernel regression. To the best of our knowledge, whether the Lebesgue con- 
stant is bounded remains an open problem — although there is empirical evidence 
in De Marchi and Schaback (2008) that the Lebesgue constant could be bounded in 
some cases. 

Thus, the best result that we can state for now is a fixed version of Yakowitz and 
Szidarovszky (1985), Lemma 2.1. 

Theorem 2. Let Assumption 2 hold. Assume that {x„, n > 1} is bounded, and denote 
by Xo its (compact) closure in R d . Then, for allx £ Xo, the following assertions are 
equivalent: 

i) V/ € C(R d ), lim„_ ( X n , x ,f) = f(x), 

ii) the Lebesgue constant at x is bounded. 

5 Proof of Proposition 1 

Assume that x £ Mf 1 is not adherent to {x n , n > 1}. Then, there exists a C°°(R ) 
compactly supported function / such that f(x) ^ and /(*;) = 0, V; G {1, . . . ,«}. 
For such a function, the quantity ( A njc ,/) cannot converge to f(x) since 

(*>n*,f) = = + f(x). 

i=i 

Under Assumption 2, J? (Mr) C Jf°, as explained in Section 4. Thus, / G Jf\ and it 
follows that X n x cannot converge (weakly, hence strongly) to 8 X in Jf?*. This proves 
the first assertion of Proposition 1 . 
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In order to prove the second assertion, pick any sequence (x n )n>l sucn that the 
closure Xo of {x„, n > 1} has a non-empty interior. We will show that a 2 (x;x„) — ► 
for a/i x £ Then, choosing x ^ Xo proves the claim. 

Recall that ^(x;^) is the orthogonal projection of E,(x) onto span{^ (x,),/ = 
1, . . . ,«} in l? (Q,&/ 7 P). Using the fact that the mapping E, (x) i— > A:(x, •) extends 
linearly to an isometry 3 from spanji* (y) , y S M. d } to ^f 3 , we get that 

= HlW-^^in)!! = dj^(k(x,-),H n ) , 

where c/^f is the distance in Jf?, and H n is the subspace of generated by k(x{, •), 
i = l,...,n. Therefore 

lima(x;x„) = lim dj?(k(x, •), H n ) = dj^{k(x^),Hj) , 

n^oo n— >°° 

where //„ = U„>iH„. Any function / 6 i/^ satisfies /(x,) = (/, ^(x,-, •)) = and 
therefore vanishes on Xo, since Jtf? is a space of continuous functions. Corollary 3.9 
of Steinwart et al. (2006) leads to the conclusion that / = since Xo has a non- 
empty interior. We have proved that = {0}, hence that Hoc = Jt? since is 
a closed subspace. As a consequence, \im n - >a> (j(x;x n ) = dj^(k x , H^) = 0, which 
completes the proof. □ 
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